perm filename PATREC[4,KMC]5 blob sn#091368 filedate 1974-03-08 generic text, type T, neo UTF8
00100	
00200	
00300	      HEURISTIC PATTERN MATCHING FOR THE RECOGNITION OF
00400	         NATURAL LANGUAGE DIALOGUE EXPRESSIONS
00500	
00600			Kenneth Mark Colby
00700	                      and
00800		       Roger C. Parkison
00900	
01000	
01100			INTRODUCTION
01200	
01300		To recognize something is to identify it as  an  instance  of
01400	the "same again".   This familiarity is possible because of recurrent
01500	characteristics of the world which repeat themselves  over  and  over
01600	again.    We  shall  describe an algorithm which recognizes recurrent
01700	characteristics of natural language dialogue  expressions  through  a
01800	multi-stage  sequence  of  functions  which  progressively transforms
01900	these input expressions into a pattern which eventually best  matches
02000	a  more abstract stored pattern. The name of the stored pattern has a
02100	pointer to the name of a response function which decides what  to  do
02200	once  the input has been characterized.         Here we shall discuss
02300	only the recognizing functions,  except  for  one  response  function
02400	(anaphoric     substitution)    which    interactively    aids    the
02500	characterization process.  How the response functions operate will be
02600	described in a future communication.
02700		In  constructing  and  testing  a  simulation   of   paranoid
02800	processes,  we  were  faced  with the problem of reproducing paranoid
02900	linguistic behavior in a diagnostic  psychiatric  interview.      The
03000	diagnosis   of  paranoid  states,  reactions  or  modes  is  made  by
03100	clinicians who judge a degree of  correspondence  between  what  they
03200	observe  linguistically in an interview and their conceptual model of
03300	paranoid behavior. There exists a  high  degree  of  agreement  among
03400	psychiatrists about this conceptual model which relies mainly on what
03500	an interviewee says and how he says it.
03600		Natural language is a life-expressing  code  people  use  for
03700	communication  with  themselves  and others.  In a real-life dialogue
03800	such as a psychiatric interview,  the  participants  have  interests,
03900	intentions,  and  expectations which are revealed in their linguistic
04000	expressions.    To produce effects on an interviewer which  he  would
04100	judge  similar  to  the  effects  produced by a paranoid patient , an
04200	interactive  simulation  of  a  paranoid  patient  must  be  able  to
04300	demonstrate  typical  paranoid  interview  behavior.  To  achieve the
04400	desired effects, the paranoid model must have  the  ability  to  deal
04500	with the linguistic behavior of the interviewer.
04600		There  are  a  number of approaches one might consider for an
04700	ideal handling of dialogue expressions.       One approach  would  be
04800	to  construct  a  dictionary  of all expressions which could possibly
04900	arise in an interview.  Associated with each expression would be  its
05000	interpretation,  depending  on  dialogue  context, which in turn must
05100	somehow be defined. For obvious reasons, no one takes  this  approach
05200	seriously.       Instead  of  an  expression  dictionary,  one  might
05300	construct a word dictionary and then use projection rules to yield an
05400	interpretation  of a sentence from the dictionary definitions.   This
05500	has been the approach adopted by most  linguistic  parsers  based  on
05600	transformational  or  systemic  grammars.    Such  a  method performs
05700	adequately as long as the dictionary  involves  only  a  few  hundred
05800	words,  each  word having only one or two senses, and the dialogue is
05900	limited to a mini-world of a few objects and  relations.     But  the
06000	problems  which  arise in a real-time psychiatric interview conducted
06100	in unrestricted English are too great for this method to be useful in
06200	actual  dialogues  in  which  immediacy  of  response is an important
06300	requirement.
06400		There is little consensus knowledge about how humans  process
06500	natural  language.   They  can  be shown to possess some knowledge of
06600	grammar rules but this does not entail that they  use  a  grammar  in
06700	interpreting  and  producing  language.   Irregular  verb-tenses  and
06800	noun-plurals do not follow rules; yet people use thousands  of  them.
06900	One   school   of  linguistics  believes  that  people  possess  full
07000	transformational grammars for processing language.  In our view  this
07100	position  seems  dubious.  Language  is  what  is  recognized but the
07200	processes  involved  may  not  be  at  all   linguistic.   Originally
07300	transformational  grammars  were not designed to "understand" a large
07400	subset of English; they represented a  set  of  axioms  for  deciding
07500	whether  a  string  is  "grammatical".  Efforts to use them for other
07600	purposes have not been very fruitful.
07700		An analysis of what one's problem actually  is  should  guide
07800	the  selection  or invention of methods appropriate to that problem's
07900	solution.  Our problem was not to develop a  consistent  and  general
08000	theory of language nor  to  assert  empirically  testable  hypotheses
08100	about  how  people  process  language.   Our problem was to design an
08200	algorithm which recognizes what is being said in a dialogue and  what
08300	is being said about it in order to make a response such that a sample
08400	of I-O pairs from the paranoid model is judged similar to a sample of
08500	I-O  pairs  from  paranoid  patients.      From  the  perspective  of
08600	artificial intelligence as an attempt to  get  computers  to  perform
08700	human  intellectual tasks, new methods had to be devised for the task
08800	of participating in a human dialogue in a paranoid-patient-like  way.
08900	We are not making an existence claim that our strategy represents the
09000	way people process language.   We sought  effective    methods  which
09100	could  operate  efficiently  in real time. We would not deny that our
09200	methods for this task are possibly analogous to  the  methods  humans
09300	use.   And  since  our  methods  provide a general way of many-to one
09400	mapping from "surface" expressions to a single stored pattern,  these
09500	methods  can be used by any type of "host" system which takes natural
09600	language as input.
09700		To  perform  the  task  of  managing  communicative  uses and
09800	effects of natural language, we adopted a  strategy  of  transforming
09900	the  input  until  a  pattern is achieved which matches completely or
10000	partially a more abstract stored pattern.  This strategy  has  proved
10100	adequate for our purposes a satisfactory percentage of the time.  (No
10200	one expects any method to be successful 100% of the  time  since  not
10300	even  humans,  the  best  natural language processors around, achieve
10400	this level of performance).   The power of this  method  for  natural
10500	language  dialogues  lies  in  its  ability  to ignore unrecognizable
10600	expressions and irrelevant details.    A  conventional  parser  doing
10700	word-by-word,  parts-of-speech analysis fails when it cannot find one
10800	or more of the input words in its dictionary. It is too fragile for a
10900	dialogue in that it must know every word.
11000		In  early versions of the paranoid model, (PARRY1), many (but
11100	not all) of the pattern recognition mechanisms were weak because they
11200	allowed  the elements of the pattern to be order independent. (Colby,
11300	Weber,  and  Hilf,  1971).  For  example,  consider   the   following
11400	expressions:
11500		(1) WHERE DO YOU WORK?
11600		(2) WHAT SORT OF WORK DO YOU DO? 
11700		(3) WHAT IS YOUR OCCUPATION? 
11800		(4) WHAT DO YOU DO FOR A LIVING? 
11900		(5) WHERE ARE YOU EMPLOYED? 
12000	In PARRY1 a procedure would scan these  expressions  looking  for  an
12100	information-bearing contentive such as "work", "for a  living",  etc.
12200	If  it  found  such  a contentive along with a "you" or "your" in the
12300	expression, regardless  of  word  order,  it  would  respond  to  the
12400	expression  as  if it were a question about the nature of one's work.
12500	(There  is  some  doubt  this  even  qualifies  as  a  pattern  since
12600	interrelations  between  words are ignored and only their presence is
12700	considered).    An insensitivity to word order has the advantage that
12800	lexical  items  representing  different parts of speech can represent
12900	the same concept,e.g.  "work" as noun or as verb. But a price is paid
13000	for  this  resilience  and elasticity. We found from experience that,
13100	since English relies heavily on word order to convey the  meaning  of
13200	it  messages,  the  penalty  of misunderstanding (to be distinguished
13300	from ununderdstanding), was too great.  Hence in PARRY2 , as will  be
13400	described shortly, all the patterns require a specified word order.
13500		It  seems  agreed  that  for  high-complexity  problems it is
13600	useful to have constraints.       Diagnostic  psychiatric  interviews
13700	(and  especially those conducted over teletypes) have several natural
13800	constraints.  First, clinicians are trained to ask certain  questions
13900	in  certain  ways. These stereotypes can be treated as special cases.
14000	Second, only  a  few  hundred  standard  topics  are  brought  up  by
14100	interviewers   who  are  trained  to  use  everyday  expressions  and
14200	especially those used by the patient himself. When the  interview  is
14300	conducted over teletypes, expressions tend to be shortened since the
14400	interviewer  tries to increase the information transmission rate over
14500	the slow channel of a teletype.  (It is said that  short  expressions
14600	are  more  grammatical  but  think  about  the phrase "Now now, there
14700	there.") Finally,  teletyped  interviews  represent  written  speech.
14800	Speech  is  known  to be 50% redundant; hence many unrecognized words
14900	can be ignored without losing the meaning  of  the  message.  Written
15000	speech is loaded with idioms, cliches, pat phrases,  etc.      -  all
15100	being   easy   prey   for  a  pattern-recognition  approach.   It  is
15200	time-wasting and  usually  futile  to  try  to  decode  an  idiom  by
15300	analyzing the meanings of its individual words.
15400		We shall now describe the pattern recognition functions of the 
15500	algorithm in some detail.
15600	
15700			PREPROCESSING
15800	
15900		Each  word  in  the  input expression is first looked up in a
16000	dictionary of 1600 (currently) entries which, for the sake of  speed,
16100	is  maintained  in  core  during  run-time. The dictionary, which was
16200	built  empirically  from  thousands  of  teletyped  interviews   with
16300	previous  versions  of the model, consists of words, groups of words,
16400	and names of word-classes they can be translated into.  The  size  of
16500	the dictionary is determined by the patterns, i.e.    each dictionary
16600	word appears in one or more patterns.    Entries  in  the  dictionary
16700	reflect PARRY2's main interests. If a word in the input is not in the
16800	dictionary, it is dropped from the pattern being formed. Thus if  the
16900	input were:
17000		WHAT IS YOUR CURRENT OCCUPATION?
17100	and the word "current" is not in the dictionary, the pattern at  this
17200	stage becomes:
17300		( WHAT IS YOUR OCCUPATION )   
17400	The  question-mark  is  thrown  away as redundant since questions are
17500	recognized by word order. (A statement followed by  a  question  mark
17600	(YOU  GAMBLE?)  is considered to be communicatively equivalent in its
17700	effects  to  that  statement  followed  by   a   period.)   Synonymic
17800	translations  of  words  are  made  so  that the pattern becomes, for
17900	example:
18000		( WHAT  BE  YOU  JOB )   
18100	Groups  of  words  are  translated  into a single word-class name, so
18200	that, for example, "for a living" becomes "job".
18300		Misspellings are also handled in  the  dictionary  by  simply
18400	rewriting  a recognized misspelling into its correct form.Thus "yyou"
18500	becomes "you". The common misspellings were gathered from  over  4000
18600	interviews  with  earlier  versions  of  the  paranoid  model.  Other
18700	misspellings do not appear in the pattern simply because they are not
18800	represented in the dictionary.
18900		Certain  juxtaposed  words  are  contracted  into  a   single
19000	word,e.g.    "GET ALONG WITH" becomes "GETALONGWITH". This is done to
19100	deal with groups of words which are represented as a  single  element
19200	in  the stored pattern thereby preventing segmentation from occurring
19300	at the wrong places, such  as  at  a  preposition  inside  an  idiom.
19400	Besides  these  contractions, certain expansions are made so that for
19500	example, "DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
19700			SEGMENTING
19800	
19900		Another weakness in the crude pattern matching of PARRY1  was
20000	that  it  took  the  entire  input expression as its basic processing
20100	unit. Hence if only two words were recognized in an eight word input,
20200	the  risk  of  misunderstanding was great. We needed a way of dealing
20300	with units shorter than the entire input expression.
20400		Expert telegraphists  stay  six  to  twelve  words  behind  a
20500	received message before transcribing it.  Translators wait until they
20600	have heard 4-6 words  before  they  begin  translating.  Aided  by  a
20700	heuristic from work in machine-translation (Wilks, 1973 ), we devised
20800	a way of bracketing the pattern constructed up  to  this  point  into
20900	shorter segments using prepositions, wh-forms, certain verbs, etc. as
21000	bracketing points. (A complete listing of bracketing  terms  will  be
21100	sent upon request). The new pattern formed is termed either "simple",
21200	having no delimiters within it, or "compound", i.e.being made  up  of
21300	two or more simple patterns. A simple pattern might be:
21400		( WHAT BE YOU JOB )
21500	whereas a compound pattern would be:
21600		(( WHY BE YOU ) ( IN HOSPITAL ))
21700	Our experience with this method of segmentation shows  that  compound
21800	patterns  from teletyped psychiatric dialogues rarely consist of more
21900	than three or four fragments.
22000	       After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
22100	to replace the commonly omitted "THAT", such that:
22200		( I THINK YOU BE AFRAID )
22300	becomes
22400		(( I THINK ) ( YOU BE AFRAID ))
22500	
22600			PREPARATION FOR MATCHING
22700	
22800		Conjunctions serve only as markers for the segmenter and they
22900	are dropped out after segmentation.
23000		Negations are  handled  by  extracting  the  "NOT"  from  the
23100	pattern and assigning a value to a global variable which indicates to
23200	the algorithm that the expression is negative in form. When a pattern
23300	is  finally matched, this variable is consulted. Some patterns have a
23400	pointer to a pattern of opposite meaning if  a  "NOT"  could  reverse
23500	their  meanings.     If this pointer is present and a "NOT" is found,
23600	then the pattern matched is replaced by its opposite, e.g.  (  I  not
23700	trust  you  )  is replaced by the pattern ( I mistrust you ). We have
23800	not yet observed the troublesome case of "he gave me not one but  two
23900	messages". (There is no need to scratch where it doesn't itch).
24000	
24100			MATCHING AND RECYCLING
24200	
24300		The  algorithm  now  attempts to match the segmented patterns
24400	with stored patterns which currently number about  2000,  1600  being
24500	simple  and 400 being compound. First a complete and perfect match is
24600	sought.  When a match is found, the stored pattern name has a pointer
24700	to  the name of a response function which decides what to do further.
24800	If a match is not found, further transformations of the  pattern  are
24900	carried out and a "fuzzy" match is tried.
25000		For  fuzzy  matching at this stage, we invented the heuristic
25100	of dropping elements in the pattern one at a time  and  attempting  a
25200	match  each  time.   This heuristic allows ignoring familiar words in
25300	unfamiliar contexts.   For example, "well" is important in  "Are  you
25400	well?" but meaningless in "Well are you?".
25500	 	Deleting one element at a time results  in,  for  example,the
25600	pattern:
25700			( WHAT BE YOU MAIN PROBLEM )
25800	becoming successively:
25900			(a) ( BE YOU MAIN PROBLEM )
26000			(b) ( WHAT YOU MAIN PROBLEM )
26100			(c) ( WHAT BE MAIN PROBLEM )
26200			(d) ( WHAT BE YOU PROBLEM )
26300			(e) ( WHAT BE YOU MAIN )
26400	Since the stored pattern in this case matches (d), (e) would  not  be
26500	constructed.   We  found  it  unwise  to delete more than one element
26600	since our segmentation method usually yields  segments  containing  a
26700	small (1-4) number of words.
26800		Dropping  an  element  at  a  time  provides  a   probalility
26900	threshold  for  fuzzy   matching which is a function of the length of
27000	the segment. If a segment consists of five elements, four of the five
27100	must be present in a particular order (with the fifth element missing
27200	in any position) for a match to occur. If  a  segment  contains  four
27300	elements, three must match - and so forth.
27400		The  transformations  described above result in a progressive
27500	coarsening of the patterns by deletion. Substitutions are  also  made
27600	in  certain  cases.  Some patterns contain pronouns which could stand
27700	for a number of different things of importance to PARRY2.       As we
27800	mentioned  in  the  introduction,  there  is  one  case  in which the
27900	response functions  of  memory  aid  in  language  recognition.   One
28000	function  of  memory is to keep track of the context in order to give
28100	pronouns and other anaphoras a correct interpretation.  For  example,
28200	the pattern:
28300		( DO YOU AVOID THEM )
28400	could refer to the Mafia, or racetracks, or other patients, depending
28500	on the context.  When such a pattern is recognized,  the  pronoun  is
28600	replaced by its current anaphoric value as determined by the response
28700	functions, and a more specific pattern such as:
28800		( DO YOU AVOID MAFIA )
28900	is  looked  up.  In many cases, the meaning of a pattern containing a
29000	pronoun is clear without any substitution.  In the pattern:
29100		(( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
29200	the meaning of THEY is clarified by ( IN HOSPITAL ).
29300	
29400			COMPOUND-PATTERN MATCH
29500	
29600		When more than one simple pattern is detected in the input, a
29700	second matching is attempted.  The heuristic methods used are similar
29800	to the first matching except they occur at the segment  level  rather
29900	than at the single element level. Certain patterns, such as ( HELLO )
30000	and ( I THINK ), are dropped because they are considered meaningless.
30100	If  a  complete match is not found, then simple patterns are dropped,
30200	one at a time, from the complex pattern. This allows the input,
30300		(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
30400	to  match  the  stored  pattern,
30500		(( HOW  DO  YOU COME ) ( IN HOSPITAL )).
30600	
30700		If  no  match  can  be found at this point, the algorithm has
30800	arrived at a default condition and the appropriate response functions
30900	decide  what  to  do.  For example, in a default condition, the model
31000	may assume  control  of  the  interview,  asking  the  interviewer  a
31100	question, continuing with the topic under discussion or introducing a
31200	new topic.
31300		An  example of interview dialgue is presented in the appendix
31400	showing the transformations the input  expressions  undergo  and  the
31500	patterns they match.
31600	
31700		ADVANTAGES AND LIMITATIONS
31800	
31900		As   mentioned,   one   of   the   main   advantages   of   a
32000	characterization strategy is that it can ignore as irrelevant what it
32100	does NOT recognize.     There are several million words  in  English,
32200	each  possessing  one  to  over  a  hundred  senses.   To construct a
32300	machine-usable word dictionary  of  this  magnitude  is  out  of  the
32400	question  at this time.    Recognition of natural language input such
32500	as described above, allows real-time interaction in a dialogue  since
32600	it  avoids  becoming  ensnarled  in combinatorial disambiguations and
32700	long chains of inferencing which would slow a dialogue algorithm down
32800	to  impracticality,  if it could even function at all. The price paid
32900	for pattern matching is that sometimes, but rarely, ambiguities  slip
33000	through.
33100		A drawback to PARRY1 was that it reacted to the first pattern
33200	it  found  in the input rather than characterizing the input as fully
33300	as possible and then deciding what to do based on a number of  tests.
33400	Another   practical   difficulty  with  PARRY1  from  a  programmer's
33500	viewpoint, was that elements of  the  patterns  were  strung  out  in
33600	various  procedures  throughout  the  algorithm.    It  was  often  a
33700	considerable chore for the programmer to determine  whether  a  given
33800	pattern   was   present   and   precisely   where   it  was.  In  the
33900	above-described method, the patterns are all collected in one part of
34000	the data-base where they can easily be examined.
34100		Concentrating all the patterns in the data base gives  PARRY2
34200	a  limited  "learning"  ability.   When  an  input fails to match any
34300	stored pattern or matches an incorrect one,  as  judged  by  a  human
34400	operator,  a  pattern  which  matches  the  input can be put into the
34500	data-base automatically.  If the new pattern has the same meaning  as
34600	a previously stored pattern, the human operator must provide the name
34700	of the appropriate response function.  If  he  doesn't  remember  the
34800	name,  he  may  try  to  rephrase the input in a form recognizable to
34900	PARRY2 and it will name the response  function  associated  with  the
35000	rephrasing.  These mechanisms are not "learning" in the commonly used
35100	sense but they do allow a  person  to  transfer  his  knowledge  into
35200	PARRY2's data-base with very little redundant effort.
35300		Informal  observation  thus  far  shows  PARRY2's  linguistic
35400	recognition  abilities  to  be  quite  superior  to  PARRY1's. A more
35500	systematic and quantitative evaluation of performance will be carried
35600	out in the future.
35700			REFERENCES
35800	
35900	
36000	Colby, K.M., Weber, S., and Hilf, F.D. (1971). Artificial Paranoia.
36100		ARTIFICIAL INTELLIGENCE, 2, 1-25.
36200	
36300	Wilks, Y. (1973). An artificial intelligence approach to machine
36400		translation. In COMPUTER MODELS OF THOUGHT AND 
36500		LANGUAGE, Schank, R. and Colby, K.M., Eds., W.H. Freeman,
36600		San Francisco.